48 research outputs found

    T-lex3 : An accurate tool to genotype and estimate population frequencies of transposable elements using the latest short-read whole genome sequencing data

    Get PDF
    Motivation: Transposable elements (TEs) constitute a significant proportion of the majority of genomes sequenced to date. TEs are responsible for a considerable fraction of the genetic variation within and among species. Accurate genotyping of TEs in genomes is therefore crucial for a complete identification of the genetic differences among individuals, populations and species. Results: In this work, we present a new version of T-lex, a computational pipeline that accurately genotypes and estimates the population frequencies of reference TE insertions using short-read high-throughput sequencing data. In this new version, we have re-designed the T-lex algorithm to integrate the BWA-MEM short-read aligner, which is one of the most accurate short-read mappers and can be launched on longer short-reads (e.g. reads >150 bp). We have added new filtering steps to increase the accuracy of the genotyping, and new parameters that allow the user to control both the minimum and maximum number of reads, and the minimum number of strains to genotype a TE insertion. We also showed for the first time that T-lex3 provides accurate TE calls in a plant genome. Availability and implementation: To test the accuracy of T-lex3, we called 1630 individual TE insertions in Drosophila melanogaster, 1600 individual TE insertions in humans, and 3067 individual TE insertions in the rice genome. We showed that this new version of T-lex is a broadly applicable and accurate tool for genotyping and estimating TE frequencies in organisms with different genome sizes and different TE contents. T-lex3 is available at Github: https://github.com/GonzalezLab/T-lex3

    Compact genome of the Antarctic midge is likely an adaptation to an extreme environment

    Get PDF
    The midge, Belgica antarctica, is the only insect endemic to Antarctica, and thus it offers a powerful model for probing responses to extreme temperatures, freeze tolerance, dehydration, osmotic stress, ultraviolet radiation and other forms of environmental stress. Here we present the first genome assembly of an extremophile, the first dipteran in the family Chironomidae, and the first Antarctic eukaryote to be sequenced. At 99 megabases, B. antarctica has the smallest insect genome sequenced thus far. Although it has a similar number of genes as other Diptera, the midge genome has very low repeat density and a reduction in intron length. Environmental extremes appear to constrain genome architecture, not gene content. The few transposable elements present are mainly ancient, inactive retroelements. An abundance of genes associated with development, regulation of metabolism and responses to external stimuli may reflect adaptations for surviving in this harsh environment

    T-lex: a program for fast and accurate assessment of transposable element presence using next-generation sequencing data

    Get PDF
    Transposable elements (TEs) are repetitive DNA sequences that are ubiquitous, extremely abundant and dynamic components of practically all genomes. Much effort has gone into annotation of TE copies in reference genomes. The sequencing cost reduction and the newly available next-generation sequencing (NGS) data from multiple strains within a species offer an unprecedented opportunity to study population genomics of TEs in a range of organisms. Here, we present a computational pipeline (T-lex) that uses NGS data to detect the presence/absence of annotated TE copies. T-lex can use data from a large number of strains and returns estimates of population frequencies of individual TE insertions in a reasonable time. We experimentally validated the accuracy of T-lex detecting presence or absence of 768 previously identified TE copies in two resequenced Drosophila melanogaster strains. Approximately 95% of the TE insertions were detected with 100% sensitivity and 97% specificity. We show that even at low levels of coverage T-lex produces accurate results for TE copies that it can identify reliably but that the rate of ‘no data’ calls increases as the coverage falls below 15×. T-lex is a broadly applicable and flexible tool that can be used in any genome provided the availability of the reference genome, individual TE copy annotation and NGS data

    A call for benchmarking transposable element annotation methods.

    Get PDF
    International audienceDNA derived from transposable elements (TEs) constitutes large parts of the genomes of complex eukaryotes, with major impacts not only on genomic research but also on how organisms evolve and function. Although a variety of methods and tools have been developed to detect and annotate TEs, there are as yet no standard benchmarks-that is, no standard way to measure or compare their accuracy. This lack of accuracy assessment calls into question conclusions from a wide range of research that depends explicitly or implicitly on TE annotation. In the absence of standard benchmarks, toolmakers are impeded in improving their tools, annotators cannot properly assess which tools might best suit their needs, and downstream researchers cannot judge how accuracy limitations might impact their studies. We therefore propose that the TE research community create and adopt standard TE annotation benchmarks, and we call for other researchers to join the authors in making this long-overdue effort a success

    Etude de la dynamique des repetitions dans les genomes eucaryotes: de leur formation a leur elimination

    No full text
    From bacteria to human, interspersed or in tandem, repeated sequences can cover more than 90 % of a genomic sequence. Despite their impact on the evolution and the plasticity of eukaryotic genomes, their mechanism of propagation remains still unclear. The continuous insertion of new copies should induce the increase of genome size. What are the selection pressures involved in the genome size regulation? Do these selection strength are the same in euchromatic and heterochromatic regions? In order to highlight the repeat dynamics, I first developed computational pipelines to detect segmental duplications (SDs) and tandem repeat arrays (TRs). The SD features of the detected in the Drosophila melanogaster genome, allowed us to propose a non-allelic homologous recombination mechanism as a SD formation model. This process can be induced by repeats such as TEs. Indeed, I showed the traces of transposable elements (TEs) at their breakpoints. To understand the relationship between the repeats and the chromatin structure, we investigated the repeat evolutionary dynamics by comparing their features in heterochromatin and euchromatin domains in Arabidopsis thaliana. We constructed phylogenetic trees of repeats to estimate their divergence in euchromatin and heterochromatin. The tree topology of TE families reflects transpositions by “burst”. In order to explain, the size and divergence variations of the TE copies between these two chromatic domains, we estimated the strength of repeat elimination into these regions. Our analysis suggests that the gene selection pressure effect induces in euchromatin the repeat elimination, although, in heterochromatin, the gene paucity allows to maintain the high TE density. However, the DNA loss rate estimations suggest the same fast turnover in the both chromatin domains. To counteract the TE insertion in heterochromatin, we proposed that non-allelic homologous recombination may play a significant role. This process allows to eliminate rapidly lots of copies.De la bactérie à l'homme, dispersées ou en tandem, les répétitions peuvent représenter jusqu'à 90 % de la séquence génomique. Malgré leur impact sur la plasticité et l'évolution des génomes eucaryotes, leurs mécanismes de formation sont encore très spéculatifs. L'insertion continue de nouvelles répétitions devrait conduire à une augmentation constante de la taille des génomes. Or, il ne semble pas que ce soit le cas. Y a t-il régulation de la taille des génomes? Le processus de régulation est-il le même dans l'euchromatine et l'hétérochromatine? Afin d'étudier la dynamique des répétitions, j'ai développé un ensemble de programmes informatiques pour la détection des duplications segmentaires (DS) et des répétitions en tandem (RT). A partir des caractéristiques des DS détectées chez Drosophila melanogaster, j'ai proposé un modèle de formation des DS, basé sur un modèle de recombinaison homologue non-allélique. J'ai également identifié les traces de l'implication des éléments transposables (ET) dans ce processus. Afin de caractériser la relation existante entre les répétitions et la structure de la chromatine, j'ai ensuite réalisé une analyse comparative de la dynamique des répétitions euchromatiques et hétérochromatiques. Pour ce travail, nous avons choisi comme modèle d'étude Arabidopsis thaliana. La construction d'arbres phylogénétiques des séquences répétées m'a permis de dater les répétitions. Nous suggérons ainsi une propagation par « vague » des ET. J'ai ensuite estimé les forces d'élimination des copies d'ET. Nos résultats suggèrent que dans l'euchromatine, la pression de sélection due aux gènes induit l'élimination des répétitions. Dans l'hétérochromatine, la faible densité en gènes permet de maintenir une forte densité en ET. Pourtant, les estimations du taux de perte en ADN, prédisent un turnover aussi rapide dans l'euchromatine que dans l'hétérochromatine. Afin de contre-balancer l'insertion des ET dans l'hétérochromatine, nous pouvons invoquer la recombinaison homologue non-allélique

    A model of segmental duplication formation in Drosophila melanogaster

    No full text
    Segmental duplications (SDs) are low-copy repeats of DNA segments that have long been recognized to be involved in genome organization and evolution. But, to date, the mechanism of their formation remains obscure. We propose a model for SD formation that we name “duplication-dependent strand annealing” (DDSA). This model is a variant of the synthesis-dependent strand annealing (SDSA) model—a double-strand break (DSB) homologous repair model. DSB repair in Drosophila melanogaster genome usually occurs primarily through homologous repair, more preferentially through the SDSA model. The DDSA model predicts that after a DSB, the search for an ectopic homologous region—here a repeat—initiates the repair. As expected by the model, the analysis of SDs detected by a computational analysis of the D. melanogaster genome indicates a high enrichment in transposable elements at SD ends. It shows moreover a preferential location of SDs in heterochromatic regions. The model has the advantage of also predicting specific traces left during synthesis. The observed traces support the DDSA model as one model of formation of SDs in D. melanogaster genome. The analysis of these DDSA signatures suggests moreover a sequestration of the dissociated strand in the repair complex

    BREC: An R package/Shiny app for automatically identifying heterochromatin boundaries and estimating local recombination rates along chromosomes

    No full text
    International audienceMotivation: Meiotic recombination is a vital biological process playing an essential role in genomes structural and functional dynamics. Genomes exhibit highly various recombination profiles along chromosomes associated with several chromatin states. However, eu-heterochromatin boundaries are not available nor easily provided for non-model organisms, especially for newly sequenced ones. Hence, we miss accurate local recombination rates, necessary to address evolutionary questions. Results: Here, we propose an automated computational tool, based on the Marey maps method, allowing to identify heterochromatin boundaries along chromosomes and estimating local recombination rates. Our method, called BREC (heterochromatin Boundaries and RECombination rate estimates) is non-genome-specific, running even on non-model genomes as long as genetic and physical maps are available. BREC is based on pure statistics and is data-driven, implying that good input data quality remains a strong requirement. Therefore, a data pre-processing module (data quality control and cleaning) is provided. Experiments show that BREC handles different markers density and distribution issues. BREC's heterochromatin boundaries have been validated with cytological equivalents experimentally generated on the fruit fly Drosophila melanogaster genome, for which BREC returns congruent corresponding values. Also, BREC's recombination rates have been compared with previously reported estimates. Based on the promising results, we believe our tool has the potential to help bring data science into the service of genome biology and evolution. We introduce BREC within an R-package and a Shiny web-based user-friendly application yielding a fast, easy-to-use, and broadly accessible resource
    corecore